Add pass hoisting RT.await_future out of scf.forall loops #748

andidr · 2024-03-15T10:26:51Z

The new pass hoists RT.await_future operations whose results are
yielded by scf.forall operations out of the loops in order to avoid
over-synchronization of data-flow tasks.

E.g., the following IR:

scf.forall (%arg) in (16)
  shared_outs(%o1 = %sometensor, %o2 = %someothertensor)
  -> (tensor<...>, tensor<...>)
{
  ...
  %rph = "RT.build_return_ptr_placeholder"() :
    () -> !RT.rtptr<!RT.future<tensor<...>>>
  "RT.create_async_task"(..., %rph, ...) { ... } : ...
  %future = "RT.deref_return_ptr_placeholder"(%rph) :
    (!RT.rtptr<!RT.future<...>>) -> !RT.future<tensor<...>>
  %res = "RT.await_future"(%future) : (!RT.future<tensor<...>>) -> tensor<...>
  ...
  scf.forall.in_parallel {
    ...
    tensor.parallel_insert_slice %res into %o1[..., %arg2, ...] [...] [...] :
      tensor<...> into tensor<...>
    ...
  }
}

is transformed into:

%tensoroffutures = tensor.empty() : tensor<16x!RT.future<tensor<...>>>

scf.forall (%arg) in (16)
  shared_outs(%otfut = %tensoroffutures, %o2 = %someothertensor)
  -> (tensor<...>, tensor<...>)
{
  ...
  %rph = "RT.build_return_ptr_placeholder"() :
    () -> !RT.rtptr<!RT.future<tensor<...>>>
  "RT.create_async_task"(..., %rph, ...) { ... } : ...
  %future = "RT.deref_return_ptr_placeholder"(%rph) :
    (!RT.rtptr<!RT.future<...>>) -> !RT.future<tensor<...>>
  %wrappedfuture = tensor.from_elements %future :
    tensor<1x!RT.future<tensor<...>>>
  ...
  scf.forall.in_parallel {
    ...
    tensor.parallel_insert_slice %wrappedfuture into %otfut[%arg] [1] [1] :
      tensor<1xRT.future<tensor<...>>> into tensor<16x!RT.future<tensor<...>>>
    ...
  }
}

scf.forall (%arg) in (16) shared_outs(%o = %sometensor) -> (tensor<...>) {
  %future = tensor.extract %tensoroffutures[%arg] :
    tensor<4x!RT.future<tensor<...>>>
  %res = "RT.await_future"(%future) : (!RT.future<tensor<...>>) -> tensor<...>
  scf.forall.in_parallel {
    tensor.parallel_insert_slice %res into %o[..., %arg, ...] [...] [...] :
      tensor<...> into tensor<...>
  }
}

compilers/concrete-compiler/compiler/lib/Dialect/FHELinalg/Transforms/Tiling.cpp

compilers/concrete-compiler/compiler/include/concretelang/Dialect/RT/Transforms/Passes.td

compilers/concrete-compiler/compiler/lib/Dialect/RT/Transforms/HoistAwaitFuturePass.cpp

...compiler/compiler/lib/Conversion/MLIRLowerableDialectsToLLVM/MLIRLowerableDialectsToLLVM.cpp

...ete-compiler/compiler/lib/Conversion/TFHEGlobalParametrization/TFHEGlobalParametrization.cpp

compilers/concrete-compiler/compiler/lib/Dialect/RT/Transforms/HoistAwaitFuturePass.cpp

BourgerieQuentin · 2024-04-04T15:03:51Z

Could be good also to have check-tests for the hoisting pass

This adds a new option `dump-fhe-df-parallelized` to `concretecompiler` that dumps the IR after the generation of data-flow tasks.

…ecific code This introduces a new function `normalizeInductionVar()` to the static loop utility code in `concretelang/Analysis/StaticLoops.h` with code extracted for IV normalization from the batching code and changes the batching code to make use of the factored function.

…erations

…sion recursive

…on recursive

…ersion patterns Some of the TFHE to Concrete conversion patterns implicitly assume that operands are ciphertexts and thus that the converted types have a higher number of dimensions than the original types. However, for non-ciphertext types, the number of dimensions before and after the conversion must be the same. This commit adds a check to the respective conversion patterns triggering a simple type conversion that preserves the number of dimensions for non-ciphertext types.

…ursive

… in TFHE passes

andidr · 2024-04-08T13:43:39Z

Could be good also to have check-tests for the hoisting pass

Done.

…th nested blocks The current scheme used by reinstantiating conversion patterns in `lib/Conversion/Utils/Dialects` for operations with blocks is to create a new operation with empty blocks, to move the operations from the old blocks and then to replace any references to block arguments. However, such in-place updates of the types of block arguments leave conversion patterns for operations nested in the blocks without the ability to determine the original types of values from before the update. This change uses proper signature conversion for block arguments, such that the original types of block arguments with converted types is preserved, while the new types are made available through the dialect conversion infrastructure via the respective adaptors.

… patterns for RT tasks

… bufferization This adds support for `memref.alloc`, `memref.load`, `memref.store`, `memref.copy` and `memref.subview` to the RT task bufferization pass.

…oops The new pass hoists `RT.await_future` operations whose results are yielded by scf.forall operations out of the loops in order to avoid over-synchronization of data-flow tasks. E.g., the following IR: ``` scf.forall (%arg) in (16) shared_outs(%o1 = %sometensor, %o2 = %someothertensor) -> (tensor<...>, tensor<...>) { ... %rph = "RT.build_return_ptr_placeholder"() : () -> !RT.rtptr<!RT.future<tensor<...>>> "RT.create_async_task"(..., %rph, ...) { ... } : ... %future = "RT.deref_return_ptr_placeholder"(%rph) : (!RT.rtptr<!RT.future<...>>) -> !RT.future<tensor<...>> %res = "RT.await_future"(%future) : (!RT.future<tensor<...>>) -> tensor<...> ... scf.forall.in_parallel { ... tensor.parallel_insert_slice %res into %o1[..., %arg2, ...] [...] [...] : tensor<...> into tensor<...> ... } } ``` is transformed into: ``` %tensoroffutures = tensor.empty() : tensor<16x!RT.future<tensor<...>>> scf.forall (%arg) in (16) shared_outs(%otfut = %tensoroffutures, %o2 = %someothertensor) -> (tensor<...>, tensor<...>) { ... %rph = "RT.build_return_ptr_placeholder"() : () -> !RT.rtptr<!RT.future<tensor<...>>> "RT.create_async_task"(..., %rph, ...) { ... } : ... %future = "RT.deref_return_ptr_placeholder"(%rph) : (!RT.rtptr<!RT.future<...>>) -> !RT.future<tensor<...>> %wrappedfuture = tensor.from_elements %future : tensor<1x!RT.future<tensor<...>>> ... scf.forall.in_parallel { ... tensor.parallel_insert_slice %wrappedfuture into %otfut[%arg] [1] [1] : tensor<1xRT.future<tensor<...>>> into tensor<16x!RT.future<tensor<...>>> ... } } scf.forall (%arg) in (16) shared_outs(%o = %sometensor) -> (tensor<...>) { %future = tensor.extract %tensoroffutures[%arg] : tensor<4x!RT.future<tensor<...>>> %res = "RT.await_future"(%future) : (!RT.future<tensor<...>>) -> tensor<...> scf.forall.in_parallel { tensor.parallel_insert_slice %res into %o[..., %arg, ...] [...] [...] : tensor<...> into tensor<...> } } ```

…rations

andidr requested review from BourgerieQuentin and antoniupop March 15, 2024 10:26

cla-bot bot added the cla-signed label Mar 15, 2024

antoniupop approved these changes Mar 18, 2024

View reviewed changes

andidr force-pushed the andi/tiling-optimizations branch 3 times, most recently from 7b07f30 to edc871d Compare April 4, 2024 09:55

BourgerieQuentin approved these changes Apr 4, 2024

View reviewed changes

andidr added 11 commits April 8, 2024 12:02

feat(compiler): Add support for tiling of FHELinalg.apply_lookup_table

6b73879

feat(compiler): Add new action dump-fhe-df-parallelized

5c81882

This adds a new option `dump-fhe-df-parallelized` to `concretecompiler` that dumps the IR after the generation of data-flow tasks.

refactor(compiler): Use reinstantiating conversion patterns for RT op…

93bb849

…erations

feat(compiler): Declare RT futures usable as element types for memrefs

74efe8b

refactor(compiler): Make type conversion in scalar FHE to TFHE conver…

a855e2b

…sion recursive

refactor(compiler): Make type conversion in TFHE global parametrizati…

9216c61

…on recursive

refactor(compiler): Make type conversion in RT task bufferization rec…

f668e82

…ursive

feat(compiler): Add support for nested Memrefs in memory usage estimator

0c7e3a3

feat(compiler): Add support for tensor.{from_elements,dim} operations…

48d919b

… in TFHE passes

andidr force-pushed the andi/tiling-optimizations branch from edc871d to c09e11c Compare April 8, 2024 13:43

andidr added 3 commits April 8, 2024 15:50

feat(compiler): Add support for dynamically-sized memrefs in lowering…

999c9a9

… patterns for RT tasks

feat(compiler): Add support for various memref operations for RT task…

fd513f1

… bufferization This adds support for `memref.alloc`, `memref.load`, `memref.store`, `memref.copy` and `memref.subview` to the RT task bufferization pass.

andidr force-pushed the andi/tiling-optimizations branch from c09e11c to 7430587 Compare April 8, 2024 13:54

andidr added 3 commits April 8, 2024 16:16

test(compiler): Add tests for tiling generating partial tiles

7cf5483

test(compiler): Add check tests for pass hoisting RT.await_future ope…

f506f5f

…rations

andidr force-pushed the andi/tiling-optimizations branch from 7430587 to f506f5f Compare April 8, 2024 14:16

andidr force-pushed the andi/tiling-optimizations branch from 278c9dc to f506f5f Compare April 9, 2024 12:54

BourgerieQuentin approved these changes Apr 9, 2024

View reviewed changes

andidr merged commit f506f5f into main Apr 9, 2024
52 of 56 checks passed

andidr deleted the andi/tiling-optimizations branch April 9, 2024 13:44

andidr mentioned this pull request Apr 11, 2024

Add support for tiling of more FHELinalg operations #779

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pass hoisting RT.await_future out of scf.forall loops #748

Add pass hoisting RT.await_future out of scf.forall loops #748

andidr commented Mar 15, 2024

BourgerieQuentin commented Apr 4, 2024

andidr commented Apr 8, 2024

Add pass hoisting RT.await_future out of scf.forall loops #748

Add pass hoisting RT.await_future out of scf.forall loops #748

Conversation

andidr commented Mar 15, 2024

BourgerieQuentin commented Apr 4, 2024

andidr commented Apr 8, 2024